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1.0 


Introduction 


With the growth. of digital space communication in 
the past decade, the introduction of sophisticated coding 
techniques has provided efficiency improvements which have 
resulted in reductions of required power or extended com- 
munication range for numerous space missions. While early 
coding applications were for relatively low data rates, 
recent emphasis has been on real-time decoders capable of 
operation at data rates above 1 Mbps and even approaching 
100 Mbps. These efforts have resulted in the development 
of high speed decoders which provide on the order of 4 to 
6 dB of coding gain depending on the data rate, code rate 
or bandwidth expansion, and error probability requirements. 

The left half of Table 1.0.1 summarizes the present 
state of efficiency improvement available with high speed 
decoders presently in operation or under development. 

The required ratio of bit-energy-to-noise-density, E^/i^, 

is given in each case for bit error probabilities of 

-4 -7* 

10 * and 10 . 

When the data speed requirements are reduced to the 
levels of deep space applications, which are on the order 
of from 1 Kbps to 100 Kbps, greater coding gains can be 
achieved. At these reduced speeds, sequential decoding 
particularly can be shown to operate more efficiently. 

*Only convolutional codes are considered here. Block codes 
which were common in early applications are so definitely 
inferior both in required complexity and in resulting per- 
formance that their further treatment is not worthwhile for 
the systems under consideration, other than as outer codes 
in a concatenated coding system. 
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A hard quantized high speed sequential decoder can be 
operated with about 1 dB less E^/l^, because of the in- 
creased number of computations per bit period. Further 
performance can be gained at lower speeds by using soft 
'8 or more level) quantization and thus regaining most 
of the 2 dB loss inherent in hard (2 level) quantization. 

Also, more efficient Viterbi decoders are possible at re- 
duced data rates, although the improvement in this case 
is not as great. 

The potential performance of low rate decoders is 
shown in the middle columns of Table 1.0.1. For the se- 
quential decoder, we consider a code-rate 1/3 system. 

Assuming a computation speed of 1 Megacomputations/second 
on soft decision data, a 64 K bit buffer, and a 500 bit 
block length with frame resynchronization, we find an 
improvement of about 2.3 dB relative to the hard decision, 
code-rate 1/2, high speed sequential decoder operating 
at 40 Mbps. The improvement is about 1 dB less if both 
are operating at 100 Kbps. For the Viterbi decoder, we 
consider a constraint-length 8, code-rate 1/3 decoder which is 
considerably less complex than the low rate sequential decoder. 

Its performance is equivalent or better for bit error 

—4 

rates above 10 , but it becomes progressively worse at 

low error rates. Improvements in either system through 
increased complexity (larger buffer and higher computation 
speed with ECL logic for the sequential decoder - higher 
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constraint lengths with greatly increased path and metric 
memory requirements for the Viteioi decoder) are very 
costly and could gain on the order of 0.5 dB. 

A more promising approach at low data rates is the 
use of concatenated or hybrid coding and decoding tech- 
niques. This study deals with the performance and imple- 
mentation of two particularly promising techniqes, shown 
in the right-hand part of Table 1.0.1. Each is based es- 
sentially on one of the decoders just discussed, augmented 
by an additional device (block decoder for the concatenated 
system - control logic and additional metric calculators 
for the hybrid system) whose complexity is not greater 
than that of the original decoders. Yet the resulting im- 
provement is much greater than would be possible if the 
original decoders were simply upgraded by increasing 
the complexity or speed in the manners discussed above. 

Some of the conclusions are summarized in the two 
rightmost columns of Table 1.0.1. The performance of the 
two systems are remarkably similar and the required buffer 
sizes are approximately the same. The concatenated approach 
appears to require about one third fewer IC's, and these 
are of the TTL rather than of the MSI ECL logic family. 

The latter are required by the hybrid system because of 
the high required speed factor of the sequential decoder. 
These advantages are partially offset by the fact that the 
concatenated system requires several read-only memory (ROM) 
and random-access memory (RAM) chips which are relatively 
expensive. 
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Otherwise, it actually appears that the concatenated 


system is preferable and that it is even cost-competitive 
with a simple sequential decoder, while achieving approxi- 
mately a 1 dB performance gain on the latter. All the 
systems in the three rightmost columns require approximately 
the same buffer size. In only two respects the concatenated 
system may be inferior to the other two: namely, while, the 

sequential decoder generally require?? about a 30 bit syn- 
chronization sequence (tail) for approximately every 500 
data bits, and the hybrid bootstrap decoder requires about 
a 164 bit synchronization sequence for approximately every 
3000 bits, the concatenated decoder in the preferred form 
requires a 4096 bit non-data sequence (consisting primarily 
of outer code parity checks) every 28,672 data bits. These 
long gaps in the data stream may not be significantly dis- 
turbing when many users are time-division multiplexed to- 
gether, but may represent a serious drawback when only one 
data stream is sent. This problem can almost certainly be 
alleviated by using a staggered interleaving scheme. 
Unfortunately, this requires the simultaneous (though still 
serial) decoding of several outer code words. A secondary 
and corollary effect is that the decoding delay in the con- 
catenated system is of the order of 32 to 64 Kbits, while 
for the sequential and hybrid sequential systems, it is 
only on the order of the 64K of buffer storage which cor- 
responds only to about 7000 bits. 
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Finally, it should be noted that an ideal rate 1/3 

eight-level soft decision coded system operating at channel 

capacity requires a bit energy-to-noise density of -0.3 dB. 

This means that the two systems under consideration are 

operating at at~>ut 2.5 dB from the ultimate capacity (or 

Shannon limit) of the coding format. Thus, it appears 

_ 7 

from Table 1.0 that at = 10 , there are almost 12 dB 

of ultimate coding gain between the uncoded system and the 
ideal coded system operating at channel capacity. With the 
first level of sophistication (leftmost third) involving 
coding with rate 1/2 codes, which may operate up to multi- 
megabit data rates, almost half this gain is achievable. 
With the second level of sophistication (middle third) in- 
volving code rate 1/3 lower data rates, longer codes for 
Viterbi decoding, and soft rather than hard decision 
sequential decoding, an additional 1 to 2 dB are gained. 
Beyond this, the third level of sophistication under con- 

_7 

sideration here gains another 1.5 to 2 dB at = 10 
Thus, obviously another such step-function increase is just 
not possible. Experience in this study and previously has 
convinced us of the futility and frustration in further 
attempts in reducing the small gap left in achievable 
coding gain. The next "breakthrough," if it ever occurs, 
might be worth another 0.5 dB. As will be discussed in 
Section 4.0, we conclude that, on the basis of present 
theory and technology, the concatenated or hybrid coding 
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systems under consideration can be realized in a cost- 
effective manner and are certain to stand as the ultimate 
in coding gain for space communication systems far into 
the foreseeable future. 

This final report is organized as follows. In 
Section 2 we treat concatenated coding and decoding, be- 
ginning with a review of the principles of operation and 
a detailed analysis of performance with various configur- 
ations. We then consider several possible implementations 
and concentrate on a detailed evaluation of the preferred 
hardware implementation. In Section 3, we proceed in the 
same way for hybrid coding and decoding. Section 4 presents 
our conclusions and recommendations. 
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2.0 


Concatenated CoOi ~ and Decoding 


The principle of concatenated coding and decoding as 

a means of reducing the number of errors in received data 

in two or more successive stages began with Elias' iterative 

coding procedures (Ref. 1) . They were extended for block 

codes by numerous researchers, the most complete study being 

that of Forney (Ref. 2). Pinsker first (Ref. 3) and later 

Stiglitz (Ref. 4) considered concatenation of convolutional 

and block codes, using a block code as the inner (first stage) 

code in an attempt to improve the channel, so as to increase 

the computational cutoff rate R for the sequential de- 

comp 

coder operating on the outer (second stage) code. While 
this produced interesting theoretical results, it requires 
a very complex and impractical inner decoder. A much more 
reasonable approach is to use the more efficient and power- 
ful code - the convolutional code - internally and thus, for 
a given complexity, improve the channel as much as possible 
for the outer decoder. While the outer decoder may also be 
convolutional, the resulting "super channel" consisting of 
the original channel with inner coder and decoder seems 
especially well suited to a particular class of block codes 
over a multiple alphabet discovered by Reed and Solomon 
(Ref. 5) . This technique used with Viterbi decoding was 
investigated by Odenwalder (Ref. 6) and found to yield rather 
impressive results. In the remainder of this section, we 
concentrate on this appr* ac. v 
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2.1 Operation 

The basic block diagram is shown in Figure 2.1.1. 

The inner coder-decoder is a short constraint length convo- 
lutional coder with a Viterbi (maximum likelihood) decoder. 
Typically this decoder is operated at an E^/l^ level suffi- 
cient to produce a bit error probability in the range 
-2 —3 

10 >Pfc>10 • The outer code is a high rate (low redun- 

dancy) block code which then reduces the final block, 
and consequently bit, error probability to the desired 
level. The most efficient class of codes found for this 
purpose are the Reed-Solomon (R-S) codes with a block length 
of 2^-1 symbols over a 2 J -ary alaphabet, where the best 
choice of J appears to be approximately equal to the con- 
straint length of the inner coder. The interleaving buf- 
fers are required because the inner decoder errors tend to 
occur in bursts, which occasionally are as long as several 
constraint lengths. While the outer decoder is undisturbed 
by burst errors within a given 2 J -ary symbol (which corres- 
ponds to J bits or about one constraint length) , its per- 
formance is severely degraded by highly correlated errors 
among several successive symbols; hence the need for inter- 
leaving. 
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2.2 


Performance 


To evaluate the performance of this concatenated 
coding system under cost and complexity constraints, the 
significant paiameters of the inner code are the R-S symbol 
error probability and the distribution of lengths of con- 
secutive R-S symbol errors, the latter being required to 
determine tr.e required interleaver dimensions. 

Both experimentally and theoretically a more di- 
rectly derived indication of inner code performance is the 
distribution cf error lengths in bits. The length of an 
error-burst for a convolutional code of constraint length K 
is naturally defined as the number of bits starting with 
the initial error and terminating when K-l consecutive 
correct bits have been received. Let this distribution 
of bit error burst lengths be denoted bj 


= Pr (at any node an error-burst of length k 

terminates) ( 2 .2.1> 

We desire to determine the distribution of lengths of con- 
secutive R-S symbol errors, P . , from the bit error burst 
distribution Q^. 

To determine P^ we must recognize, first of all, 
that the error burets on the inner convolutional code are 
totally asynchronous to the outer code symbol phase. 
Suppose then that the first incorrect R-S symbol begins 
m bits prior to the start of the convolutional code bit 
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rror burst. Because of the asynchronous nature of the 
situation, m is a uniformly distributed random variable 
on the interval 0 <_ m £ J - 1. Now conditioning on a 
fixed m we have 


jj-m 

p j (m) - 2 Q * 

k=( j-1) J-m+1 


, j ■ 1, 2, ... (2.2.2) 


and we define 


0^=0 for k <_ 0 

Summing on the variable m, we have therefore. 


J-1 

p j - 2 p j (in) 

m=0 


J-1 jJ-m 

2 S Q * ' j=1 ' 2 ••• 

m=0 k= (j-1) J-m+1 

(2.2.3) 


If J > K-l, this expression is exact since every subsequence 
of J bits must contain at least one error and hence cause the 
R-S symbol to be in error. On the other hand, if J < K-l, 
some R-S symbol in the sequence may possibly be correct, so 
that (2.2.3) becomes an over estimate at the high end of the 
distribution. 


To obtain the R-S symbol error probability from the 
error length distribution, we need to weigh Pj by the number 
of errors in each case. Since, as pointed out above, we take 
all consecutive symbols to be in error, we have for the symbol 
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error probability 


00 




n_ = 7 . jp 4 

j-i 


s Zj ■'"3 


(2.2.4) 


Also froir, (2.2.3) we can obtain the desired interleaving length, 
For example, if we require that the ultimate (outer code) 
error probability be P^, then we should take the interleaver 
length L in R-S symbols to be such that 


P L < P b 


(2.2.5) 


Finally, assuming a long enough interleaver so that 
we can neglect error dependencies, the output bit error 
probability can be bounded as follows. For an E-error- 
correcting R-S outer code, a R-S block error occurs when 
more than E symbol errors occur in the block. When this 
happens, the R-S decoder indicates that at most E symbols 
are in enor. So, if the superchannel causes E+i, 
l£i£2 J -l-E, symbol errors in the block, at most 
2E + i symbol errors will result. Thus, the concatenated 
code symbol probability of error can be upper bounded by 


P < 
s — 




; n - v 2 ’- 1 - 1 


( 2 . 2 . 6 ) 


i-E+1 
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Since some of the bits in an incorrect symbol may be 

correct, the bit probability of error is less than or equal 

% 

to the symbol probability of error. The symbol errors caused 
by the R-S decoder will have about half their bits in error, 
while those caused by the superchannel will typically have 
from .25 to .40 of their bits in error, depending cn the 
particular inner code and channel. Here we will simply upper 
bound the bit probability of error by the symbol probability 
of error . Thus , 


P 


b 


< 


2 J -1 

E 

i=E+l 


(i+E) 




(i-n s ) 



1-i 


(2.2.7) 


To cover the data rates of interest and to provide 
the performance data needed to optimize this system for 
various complexity constraints, the following inner codes 
were simulated. 


1) 

K=7 , 

R=l/2 with 

2) 

K=8 , 

R-l/2 with 

3) 

K=8 , 

R=l/3 with 

4) 

K“8, 

R=l/7 with 


code generators 
code generators 
code generators 

code generators 


11110 0 1 
10 110 11 

111110 0 
10 10 0 11 

11110 11 
110 110 0 
10 0 10 10 

111110 0 
10 10 0 11 
11110 11 
110 110 0 
10 0 10 10 
10 0 1111 
1110 0 10 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 
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The code generators in Cases 1, 2, and 3 are those 
obtained by Odenwalder in Reference 6. These code gener- 
ators were chosen to minimize the bit probability of error 
at large E b /N Q ratios. However, in the range of E^/l^'s 
used here other codes could yield better results. 

The code generators in Case 4 were obtained using the 
code generators of Odenwalder' s rate 1/3 code, his rate 1/2 
code, and the reciprocals of his rate 1/2 code. In this case, 
this yields a code with a free distance of 38, which is close 
to the upper bound of 40 which Heller (Reference 7) has ob- 
tained on the free distance of K=8, R=l/7 codes. 

These simulations were for convolutional coding sys- 
tems with practically implementable Viterbi decoders 
(Reference 8) using 8 levels of receiver quantization and 
a path length memory of 32 bits. The bit error burst length 
statistics were computed and equations 2.2.3 through 2.2.7 
were used to compute the R-S symbol probability of error, 
the distribution of lengths of consecutive R-S symbol 
errors, and the bit probability of error bound. 

Figures 2.2.1 through 2.2.4 give the concatenated 
code bit probability of error bound for a K=7, R=l/2 con- 
volutional code and 6 , 7 , 8 , and 9 bits per R-S symbol , 
respectively. They show that for a fixed alphabet size 
and probability of error, there is an optimum number 


- 15 - 









2 


2 


2.7 


2.8 


2.8 


gure 2.2 


.4 2.S 2.8 

— Z b /* o in dB — 

4. Concatenated Coding Performance with a K»»7 , 
R*l/2 Inner Code and 9 Bits/R-S Symbol. 
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of correctable errors. That is, if the outer code is designed 
to correct too many errors, the inner code E^/N o decrease, and 
thus the superchannel symbol probability of error increase, 
more than offsets the large error correcting ability of the 
outer code. These curves also show in some cases that it 
may be desirable to design the outer decoder to correct less 
than the optimum number of correctable errors. Such a sys- 
tem would require a larger E^/N o ratio to achieve a specified 
probability of error, but the decoder would be faster and 
easier to implement. 

Figures 2.2.5 through 2.2.8 summarize the performance 
of this concatenated coding system for the four convolutional 
inner codes, various alphabet sizes, and near optimum outer 
code error correcting ability. 
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Pigure 2.2.6. Summary of Concatenated Coding Performance 

with a K=8, R=l/2 Inner Code and a 2 -Symbol 
E-Error-Cor. acting Outer Code. 






E-Error-Correcting Outer Code. 
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Figure 2.2.8. Summary of Concatenated Coding Performance with 
a K=8, R=l/7 Inner Code and a 2 J -Symbol E-Error- 
Correcting Outer Code. 
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2. 3 Implementation Procedure 


At data rates up to 100 Kbps, hardware implementation 
of constraint length 7 and 8 Viterbi decoders is relatively 
straightforward. LINKABIT has implemented a K=7, R=l/2 
Viterbi decoder* using only 85 IC's and the implementation of 
K=8 Viterbi decoders is documented in References 8 and 9. 

So most of the system design here is concerned with the outer 
coder-decoder and the interleaving buffers. For the present 
purpose, the inner coder-decoder can be regarded as part of 
the channel. The inner code rate and constraint length have 
virtually no effect on the outer code design, except to the 
small extent that longer constraint lengths cause longer er- 
ror bursts and hence require longer interleaving. 

The basic outer code parameters are summarized in 
the code structure diagram of Figure 2.3.1. Each of the I 
rows in this array represents a R-S code word of 2 J -1, J-bit 
symbols followed by a J-bit segment of a synchronization 
sequence. This assumes, of course, that the data to be trans 
mitted can be interrupted periodically for the insertion of 
the (2E+1) JI parity and synchronization bits. This will be 
the case, for example, when several users are time-division 
multiplexed together. I is the degree of interleaving, 
chosen sufficiently long to ensure the independence of suc- 
cessive horizontal R-S symbols, E is the guaranteed number 
of correctable R-S symbol errors, and 2E is the required 


* It is estimated that a K=*8, R*=l/3 Viterbi decoder can be 
implemented for data rates up to 100 Kbps with 150 TTL IC's 
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number of parity checks. It is assumed that the data is 
presented to the encoder in blocks of IJ(2 J -1-2E) information 
bits followed by a period where the 2EIJ parity bits and 
the IJ-bit synchronization sequence can be inserted. The 
encoded bits are read out of the array in blocks of J bits 
(one R-S symbol) one column at a time and fed to the inner 
convolutional encoder . 

Code synchronization is obtained using the IJ-bit 
synchronization sequence of Figure 2.3.1 and the synchron- 
ization ability of the Viterbi decoder. The Viterbi decoder 
provides inner code node synchronization and phase ambiguity 
resolution as described in Reference 8. Then the IJ-bit 
sequence of superchannel symbols is used to obtain code 
array, and thus R-S symbol, synchronization. However, due 
to the bursty nature of the superchannel, several code ar- 
rays may have to be examined to obtain the code array syn- 
chronization. 

2.3.1 Encoder and Interleaver Design 

The encoding and interleaving operations can be 
accomplished as shown symbolically in Figure 2.3.2. This 
basic encoder is the most efficient for a cyclic code with 
2E parity checks when 2E<2^-1-2E (see Figure 6.5.5 of 
Reference 10) . The double lines represent J-bit signal 
flow and the additions and multiplications are over GF(2 J ). 
The generator polynomial is 

pp.i 

9(D) = 90+9! D+ ... +9 2e _ 1 d (2.3.1) 
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where the coefficients, g^, are from GF(2 J ). In particular, 

if the field is generated by a primitive element a and a, 

2 3 2E 

a , a , . . . , a are roots of the code word polynomial , then 

2E 

g (D) = n (D-ct 1 ) (2.3.2) 

i=l 

We will restrict our attention to this class of R-S codes 
throughout this report. 

In an actual implementation the input and output 
are a sequence of binary symbols, so a serial-to-parallt 1 
operation must be performed at the input to the parity 
computation section and a parallel-to-serial operation 
must be performed before the outputs are fed to the con- 
volutional encoder. A description of a hardware imple- 
mentation of the encoder and interleaver is given in a 
later section. The important point is that the entire 
code array of Figure 2.3.1 does not have to be stored, 
only the 2EIJ parity bits need to be stored. 

2.3.2 Unscrambler and Decoder Storage Implementation 

The major storage requirement in this concatenated 
coding scheme is in the receiver unscrambler where the 
sequence of received R-S symbols must be grouped into R-S 
words and the decoded R-S symbols must be arranged so that 
they are presented to the data sink in the proper sequence. 
Figure 2.3.3 illustrates a method of implementing this un- 
scrambling operation. This implementation operates as follows. 
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The first received symbol goes to the first register in the 
upper set of registers, the second received symbol to the 
second register, etc., until the I**“ symbol is stored in thr 

i.u 4- 

l register. Then the (1+1) symbol is shifted into the 

first register and the procedure is continued until the 

registers are filled. Referring to the code array of 

Figure 2.3.1, it can be seen that this procedure puts the 

first I R-S words in the upper I registers. When these 

registers are filled, all of the switches are change 1 to 

their other position and the input symbols are shifted intc 

the lower set of registers. Meanwhile the words in the 

upper register are shifted into the R-S decoder in the 

end-around manner shown and the corrected symbols are shif ' ed 

th 

back into the 1^ register. After all the shifts have been 
completed, these registers contain a corrected version of 
the original I words. Then when the lower registers are 
filled, the positions of the switches are changed again and 
the words in the lower registers are shifted through the 
R-S decoder. The incoming symbols are shifted into the 
upper registers and the symbols shifted cut are the decoded 
properly sequenced symbols. 

This implementation has the advantage that the R-S 
decoder is independent cf the interleaver. Other interleaving 
procedures could reduce the storage nearly by half, but at the 
cost of more complex control and staggered access to the R-S 
decoder. Investigation of these procedures has shown them 
to be less cost effective than the present one. 
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2.3.3 Reed-Solomon Decoding Procedure 


A R-S decoder can be implemented in four steps: 

1. Calculate the syndromes from the received 
sequence . 

2. Use the Berlekamp Algorithm to find the error 
locator polynomial o (D) . 

3. Use a Chien Search to find the roots and hence 
the location of the errors. 

4. Find the values of the errors. 

The received word from the output of the inner 
decoder will be denoted 

2 J -2 

y(D) = ^ y n D n (2.3.3) 

n=0 

where y l<i< 2*^-1, represents the i*"* 1 received symbol. 

2 J -l-i, ~ ~ 

If a i.s a primitive element which generates the field, then 
the syndromes can be calculated by 



0<i£2E-l 

(2.3.4) 

where N is the symbol block length of 2 J -1. Thus each syn 
drome can be calculated by adding each successive received 
R-S symbol into an initially empty register, multiplying 
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the sum by a , and returning the result to the register 
awaiting the next received symbol. Figure 2.3.4 illustrates 
this procedure for the i-th syndrome. 

The Berlekamp Iterative Algorithm for computing the 
error locator polynomial, a(D), from the syndromes is well 
documented by Berlekamp (Reference 11) and Massey (Refer- 
ence 12) . This algorithm is equivalent to synthesizing the 
minimum length shift register, over GF(2 J ), to generate 
S Q , ..., S 2 E _-^* The resulting tap coefficients are the 
coefficients of the error locator polynomial. We will use 
the notation and follow the block diagram given in Refer- 
ence 10, Figure 6.7.4. 

The Chien search determines whether a given symbol 
is in error by evaluating the polynomial 

a (D) = l+alD+. . .+ o„D E (2.3.5) 

L 

at all inverse values of the primitive field element a. 

If 

there is no error in the 
n't* 1 symbol. 

there is an error in the 
nth symbol. 

1, 2, ..., 2 J -1-2E 
shown in Figure 6.7.5 of 


0 (a- n ) = l + I>i(“- n ) i 


i=l 


yo. 


= 0 , 


n = 


This search can be implemented as 
Reference 10. 
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1.U 

Figure 2.3.4 Procedure for calculating the i Syndrome 


-3 



After the error locations have been located with 
the Chien search, the error values must be calculated. 

If less than or equal to E errors have occurred, the error 
values are given by the formula (Reference 10) 



A(cf n ) 
a'( cf n ) 


n=n^ , n 2 . 



(2.3.6) 


where n^ is the location of the i-th error 


O' (D) 


2 4 

a,+a-D +a c D +. 

\ 3 5 


+<y> 


F-l 


where F= 


E, if E odd 
E-l if E even 


(2.3.7) 


and 


A(D) = fs (D) O (D) J q _1 =S 0 + (SqOj+Sj^ D+ (S 0 O 2 +S 1 a 1 +S 2 ) D 2 


+ 


(So a g_i +S l°E-2 + • * • +S E-1 ) D 


.E-l 


(2.3.8) 
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2.4 


Part Softv/are and Part Hardware Decoder Implementation 


Several parts of the concatenated decoder are ideally 
suited to hardware implementation. As noted previously, the 
hardware implementation of constraint length 7 and 8 Viterbi 
decoders is relatively straightforward at speeds of less than 
100 Kbps. Also, the unscrambler of Figure 2.3.3 can be easily 
implemented in hardware, but it would require a large amount 
of storage to implement in software. So these operations 
should clearly be done in hardware. 

The R-S decoder can be efficiently implemented entirely 
in hardware or in part hardware and part software. The most 
efficient implementation will depend on the required speed and 
the code parameters. 

Since the performance curves indicate that high rate 
R-S codes should be used, the slowest parts of the R-S decod- 
ing are steps 1 and 3 which have to be performed for each re- 
ceived symbol. This indicates that these steps should be per- 
fomred in hardware. However, the interfacing problem in going 
from a software step 2 to a hardware step 3 and then back to 
a software step 4 may make it desirable to perform step 3 in 
software also. 

To estimate the speeds of these three steps in the R-S 
decoder, we wrote a computer program to perform these steps. 
One of the problems in writing such a program is to find 
efficient ways of storing, adding, and multiplying field ele- 
ments. Field elements over GF(2 J ) can be represented as 
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powers of a primitive field element a or as (J-l)- degree 
polynomials over GF(2). In this program we represented the 
field elements by J-bit integers with the bits corresponding 
to the coefficients in their polynomial representation. Field 
addition is accomplished with a bit-by-bit exclusive-OR operation. 

Field multiplication and division are performed using 
field log and anti-log tables. The log table lists the corres- 
ponding power of a for the integer representations of the 
2 J -1 non-zero field elements and the anti-log table lists the 
integer field element representations for the powers of a. 

With these tables multiplication/division of two non-zero field 
elements is accomplished by adding/subtracting their logs 
modulo (2 J -1) and looking up the resultant in the anti-log table. 

Appendix A gives a listing of a Fortran and an assembly 
language version of this program. The decoding speeds of the 
various steps in the assembly language program are given in 
Table 2.4.1 for two sets of R-S code parameters. 

These times are for the LINKABIT, Digital Scientific 
META-4 Computer with a one microsecond core memory cycle. 

Each time is based on the time to decode three arbitrarily 
chosen sets of E error locations and values. For the cases 
timed there was less than a 3% variation in these times. 

Table 2.4.1 lists the largest of the three times. 

This table shows that, at least for the two R-S 
decoders timed, a serial software implementation of steps 
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2, 3, and 4 is too slow. If the Chien search (step 3) is 
performed in hardware and two minicomputers are used for 
steps 2 and 4, the decoding speed is limited by the speed 
of the Berlekamp Algorithm (step 2). Table 2.4.1 indicates 
that such an implementation would be satisfactory with lower 
speed requirements or for codes with smaller guaranteed 
error-correcting abilities. 

This program could be speeded up by perhaps as much 
as a factor of 10 by using micro-programming techniques. 

If this were done, steps 2, 3, and 4 of the Berlekamp 
Algorithm could probably be serially implemented at up to 

7 

100 Kbps for the 2 -symbol 8-error-correcting code. How- 
ever, a hardware implementation appears desirable for the 

g 

more powerful 2 -symbol 16-error-correcting code. 
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2.5 Hardware Implementation 


The present discussion on the hardware implementation 

O 

will be limited to a system with a 2 -symbol* 16-error-correc- 
ting R-S code and an interleaver length of 16. In section 2.2 
it was shown that for this alphabet size and desired range of 
error probabilities, 16 is the optimum number of correctable 
errors. The computer simulation also showed that in this case 

an interleaving length of 16 was sufficient for probabilities 
of error down to 10” 8 . Here we will show that this system can 
be hardware implemented at a reasonable cost. The design 
principles are the same for systems with different high rate, 
low speed R-S decoders. 

First, we discuss the hardware implementation of some 
of the basic field operations. Then we present an outline 
of a hardware implementation with an estimate of the number 
of integrated circuit chips required to accomplish the opera- 
tions . 

2.5.1 Hardware Implementation of Field Operations 

0 

As in the software implementation, let the GF(2 ) field 
elements be represented by polynomials of degree less than 
8 in a. That is, a field element Y is represented as 


* This is a particularly convenient field size since then 
the field elements can be stored in 8-b.it shift regis 
ters. 
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^ Y i a 1 (2.5.1) 

i=0 

where the coefficients are binary numbers. Also, in 
order to obtain specific circuits for performing field 

g 

multiplication, let the GF(2 ) field be generated by a 
field element ct with a primitive polynomial 

M (D) = 1 + D 2 + D 3 + D 4 + D 8 (2.5.2) 

The only criterion used in selecting this primitive poly- 
nomial is that it have minimum weight which in this case 
is 5. 

One method of multiplying two non-zero field elements 
is to look up their logarithms in a log table, add the logs 
modulo 255, and look up the result in an anti-log table. 

Each log and anti- log table look-up can be implemented 
with a 256 x 8 read-only memory (ROM) and the addition can 
be implemented with two chips. In general, to multiply two 
arbitrary field elements a test would have to be made to 
determine if either were zero and, if this were the case, 
the output would be set to zero. Thus, excluding control 
circuitry, 7 chips are required*. 

*This is reduced when a variable element is multiplied by a 
fixed element (such as in polynomial evaluation where the 
fixed element is a polynomial coefficient) since then we can 
simply store the logarithm of the fixed element rather than 
the element itself, thus avoiding one ROM, and if the f.^xed 
element is non- zero, one zero test chip. 
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Another method (Reference 11) of multiplying two 
field elements, U and V, is illustrated in Figure 2.5.1. 
Initially the two field elements to be multiplied are stored 
in the U and V registers and the Z register is set to zero. 
The U register is wired to multiply by a, the V register is 
a storage register which can be shifted to the right, and 
Z is an accumulator register. The multiplier operates as 
follows. Depending on the lowest bit of V, U is either 
added or not added into Z. Then the U and V registers are 
shifted and the process is repeated. After 8 steps Z con- 
tains y V^(U ot ) , the desired product. 

Figure 2.5.2 gives an implementation of this field 
multiplication procedure. In this and the proceeding imple- 
mentation diagrams, L denotes low, H high, and X irrelevant. 
Excluding control circuitry, this implementation requires 8 
chips. However, the chips required here are less costly 
than those in the previous implementation. 

The best way of obtaining the inverse of a field ele- 

0 

ment is to look up the answer in a table containing the 2 -1 
inverses. This can be implemented with one 256 x 8 ROM. 
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Figure 2.5.2 Alternate Field Multiplication Implementation. 








2.5.2 Hardware Encoder and Interleaver Design 


In section 2.3.1 we described the general procedure 
for implementing the encoder and interleaver. Figure 2.5.3 
gives an outline of a hardware implementation of this pro- 
cedure. Random-access memories (RAM's) provide the parity 
symbol storage and a read-only memory (ROM) provides the 
storage for the logs of the generator polynomial coefficients. 
The main difference between this implementation and the pro- 
cedure shown in Figure 2.3.2 is that here the multiplications 
are performed in series instead of in parallel as indicated 
in Section 2.3.1. That is, for each input symbol, 32 cycles 
through this circuitry are required to update all of the 
parity symbols for that R-S word. Then the RAM selects the 
next set of 32 parity symbols and the same procedure is re- 
peated for the next input symbol. This serial computation 
procedure, of course, takes longer than the parallel procedure, 
but it is fast enough to obtain the required coding speeds 
and it has far fewer parts. 

Above each block in this diagram is an estimate of 
the number of TTL chips required to accomplish the operation. 
The composite RAM shown requires four 1024 x 1 RAM chips and 
must be clocked twice to obtain the desired 8-bit output. 

The field multiplication is performed using the logarithmic 
procedure described in the previous section. However, the 
complexity of this multiplier is reduced a little by stor- 
ing the logarithms of the g^ coefficients instead of their 
polynomial representations. If any coefficient is zero. 
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the value 255 can be substituted since the largest logarithm 
is 254. The dotted lines indicate that if either multiplier 
input is zero, the output is zero. 

This diagram does not include the control or synchroni- 
zation circuitry. It is estimated that 8 and 4 chips, respec- 
tively, are required to accomplish these operations. 


- 48 - 



2.5.3 Synchronization Implementation 

As described in Section 2.3, the Viterbi decoder 
provides inner code synchronization and phase ambiguity 
resolution and the IJ = 128 bit sequence, consisting of 
16 superchannel symbols, is used to obtain block code array 
synchronization. At the moderate data rates required here, 
the code array synchronization can be implemented with a 
simple correlation detector. That is, for each received 
superchannel bit the detector correlates the 128-bit re- 
ceived sequence, terminating at that bit, with a locally 
generated copy of the synchronization sequence and compares 
the output with a threshold to determine the starting bit 
of the code array. The most recent 128 superchannel bits 
can be stored in a RAM, the synchronization sequence can 
be generated with 2 chips, and the correlator, consisting 
of an exclusive-OR circuit and a counter, can be implemented 
with a little over 2 chips. Adding a few chips for control 
circuits, a total of about 8 chips are required. 

This, of course, requires that for each bit time 
(>_ 10 usee.) the locally generated synchronization sequence 
be shifted and modulo- 2 added to the stored most recently 
received 128 bits. Thus, the synchronization sequence must 
be shifted at a speed of up to 12.8 MHz, which is well within 
the capabilities of the TTL logic. 
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2.5.4 Hardware Unscrambler Implementation 

For the J=8, 1=16 system being considered, the un- 

1 fi 

scrambler of Figure 2.3.3 requires 2 = 65,536 bits of 

storage. The best way of implementing this is to use 16 
4096 x 1 MOS RAM's. Dynamic MOS shift registers could also 
be used, but they would have to be recirculated at the lower 
data rates. Using the MOS RAM's, 16 chips are required for 
the storage requirements and it is estimated that an addi- 
tional 14 chips are required for the control and rather 
formidable addressing circuitry. Thus a total of 30 chips 
are required. 

2.5.5 Hardware Reed-Solomon Decoder Design 

A sketch of the overall design of a R-S decoder is 
shown in Figure 2.5.4. Typically the decoder will be com- 
puting the syndromes for one word while the remaining de- 
coding steps are performed for the previous word. The 
Chien searcher checks each symbol to see if an error has 
occurred in the symbol about to be shifted out of the buffer. 

If so, the error value is computed and the symbol is corrected. 

A timing diagram of the R-S decoding operation is given 
in Figure 2.5.5. The lines in this figure indicate the rela- 
tive amc \ '.s of time and the sequence of operations in the 
Reed-Solomon decoding procedure. 
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Register of about 2J2 »4096 bits for bits awaiting correction. 
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Syndrome Calculation 
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2 . 5 . 5 . 1 Syndrome Calculation 

As with the encoder implementation, the number of 
chips required to implement the syndrome calculation can 
be greatly reduced by using a serial instead of a parallel 
implementation. Figure 2.5.6 shows a serial implementation 
of this procedure. Referring to Figure 2.3.4, the counter 
of Figure 2.5.6 generates the logs of the a 1 elements and 
the lower RAM contains the storage for the syndromes being 
calculated. During the period immediately following the 
first received symbol of the word, the feedback is removed 
and in 32 serial steps the first term of each of the syn- 
dromes is written into the lower RAM. When the remaining 
symbols in the word are received, the feedback is used to 
modify the syndromes as shown in Figure 2.3.4. Again 32 
steps per received symbol are required to modify all of the 
syndromes. On the last series of modifications, i.e., after 
the last symbol of the word is received, the syndromes are 
also stored in the upper RAM's for use in the other decoding 
steps. 

The estimated number of TTL chips required to implement 
the various steps and the control circuits are shown. 

2. 5. 5. 2 Berlekamp Algorithm Implementation 

Reference 10 provides a good description of the Berle- 
kamp Iterative Algorithm. Basically the algorithm synthesizes 
,he shortest length shift register which will generate the 
syndrome sequence. The resulting tap connections are the 
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Figure 2.5.6 Syndrome Calculation 












coefficients of the error locator polynomial . As described 
in Reference 10 and illustrated in Figure 6.7.4 of Reference 
10, at each iteration the algorithm computes the discrepancy, 
d n , between the next syndrome and the next output of the 
present shift register. If this discrepancy is not zero, 
a nev set of tap connections is generated. 

Figure 2.5.7 gives a simplified block diagram of the 
algorithm and Figures 2.5.8 and 2.5.9 outline an implementa- 
tion of the two main parts of this block diagram. 

In Figure 2.5.8 the R1 RAM contains the present set 
of shift register tap connections and the d^ register accumu- 
lates the terms of the next discrepancy as indicated. 

The diagram of Figure 2.5.9 illustrates the operation 
of the main processor in the notation of Figure 6.7.4 of 
Reference 10. At each iteration this processor checks to see 
if the next discrepancy is zero. If it is, this processor 
merely shifts the words in the R3 RAM one address location 
and inserts a "0" symbol. If the next discrepancy is not 
zero, a new set of shift register tap sequences must be 
computed. This is accomplished by modifiying each of the 
16 words in the RAM's as shown and then shifting the words 
in the R3 RAM one address location and inserting a "0 n or 
a "l” symbol, depending on the polarity of n-2/ n . Also if 
d n j*0 and n>2A n , and d* must be updated as indicated. 
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Figure 2.5.8. d Calculation Procedure. 
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shift the words in the RAM one address location 
insert a "0" symbol, if d^O, then after the words in t ] 
RAM are updated, shift one address location and insert a 
or a "l" symbol as indicated. 



Figure 2.5.9 







Figure 2.5.10 sketches an implementation of this 
algorithm. This implementation uses the procedures out- 
lined in Figures 2.5.8 and 2.5.9 and adds circuitry to 
implement some of the other operations. 

The dotted line from the d =0 tester indicates 

n 

that when d =0, control is shifted to the R3 RAM as des- 
n 

cribed in Figure 2.5.9. The other dotted lines indicate 
that, as before, when a multiplier input is zero, the out 
put is set to zero. 
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Figure 2.5*10 Berlekamp Algorithm Implementation 



2 . 5 . 5 . 3 Chien Search and Error Evaluation Preparation 

The Chien search and error evaluation preparation 
step stores and, when necessary, computes the coefficients 
of the A, a, and a' polynomials so that they can be used 
efficiently in the Chien search and error evaluation pro- 
cedure. The coefficients of the a polynomial, ck, are com- 
puted with the Berlekamp Algorithm and, as shown in Equation 
2.3.7, the coefficients of the o' polynomial, o^, are equal 
to ck + ^ for i even and zero otherwise. So this step merely 
stores these coefficients so that they can be readily ac- 
cessed by the Chien search and error evaluation circuits. 

The coefficients of the A polynomial must be com- 
puted. They can be computed directly from tne formula 
(2.3.8) or their calculation can be incorporated into the 
Berlekamp Algorithm (Reference 10) . In this case, the 
direct approach appears to be less complex to implement. 
Figure 2.5.11 gives an outline of an implementation using 
this approach. This implementation accumulates the sum 
defining each coefficient in the temporary A^ storage 
register and then stores the result in the A^ RAM. 
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Ai RAM Connections with £rror Evaluation Circuits 
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Figure 2.5.11 Chien Search and Error Evaluation Preparation 








2 . 5 . 5 . 4 Chien Search and Frror Evaluation 

As described in section 2.3.3 the Chien search 

determines whether a given symbol is in error by computing 

_ n th 

o(a ). If this quantity is nonzero, the n ^ symbol is 

said to be correct. Otherwise an error of value A(oT n )/a" (a” n ) 

is said to have occurred in that symbol. 

Figure 2.5.12 gives an implementation of this pro- 
cedure. This implementation performs the Chien search and 
evaluates the A and o' polynomials in parallel, first for 
n = N-l , then for n = N-2, and so forth, where N = 2 J -1=255 . 

In the first step the circuitry accumulates 


15 



in the a, A, and o' storage registers, respectively. Then 
the NOR gate checks to see if the first received symbol, 
^N-l' is correct. That is, it checks to see if a(a”^ N ~^) 
is nonzero. If so, the output AND gate produces a sequence 
of 8 zero bits. If the NOR gate output is high, an error is 
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) is formed 


indicated. In this case A (a ^)/a"( a ^ 

as shown and this error sequence is selected as the output. 

fch 

At the k step this implementation evaluates the 

a, A, and o' polynomials at checks to see if the 

th 

k^ received symbol is in error, computes the value of 
the error if there is one, and outputs an estimate of the 

i.U 

superchannel bit error sequence corresponding to the k 1 " 
received symbol. 
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2.5.6 Hardware Implementation Summary 

Table 2.5.1 summarizes the number of chips required 
to hardware implement the various operations in this con- 
catenated coding system. This table shows that most of 
this system can be implemented with TTL logic. MOS chips 
are used only in the unscrambler and in the delay line 
storage during R-S decoding, where large amounts of storage 
are required. A total of 17, 4096 x 1 MOS RAM's are used 
for these storage purposes. 

The table shows that the decoder for this concatenated 

coding system can be implemented with only a little over 

twice the number of chips as the basic Viterbi decoder. 

That is, it requires a few more chips than a K - 9, R ■ 1/3 

Viterbi decoder. However, this concatenated coding system 

only requires 1.93 and 2.18 dB to achieve bit error prob- 

-4 -7 

abilities of 10 and 10 , respectively, while the K = 9, 

R - 1/3 Viterbi decoder system requires a^out 2.6 and 4.2 dB. 
To obtain the same performance as this concatenated coding 
system, a considerably longer and exponentially more complex 
Viterbi decoder system would be required. 
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3.0 Hybrid Bootstrap Decoding 


(16 17 18 ) 

Performance results for hybrid bootstrap decoding ' ' 

based on extensive simulations by Herman are presented in 
two papers (Refs. 14, 15). The design considered in Section 
3.2 follows these papers very closely, since alternate 
schemes have not been adequately tested. We are particularly 
interested in the rate 1/3, one group, soft-decision decoder 
without multiple processing, which achieves performance com- 
parable to that of concatenated coding. Performance is re- 
viewed in Section 3.1 and an implementation based on MECL 
10,000 logic is presented in Section 3.2. Suggestions and 
comments on other approaches are contained in Section 3.3* 

A careful comparison and evaluation of hybrid and concatenated 
coding is contained in Section 4. 

3.1 Performance Results 

In hybrid decoding, the principal source of failure 

is block erasure due to inadequate time to decode. Undetected 

errors also occur. An undetected output bit error rate of 

2.5 x 10~ 6 near R is cited in Ref. 14 for the rate 1/2 

comp 

code. It is anticipated, however, that with proper choice 
of parameters and during operation at rates below R com p» 
that is, with E^/N o of 1.5 dB or higher, the undetected error 
rate will be significantly lower than 10~^ and not a signif- 
icant cause of system degradation. 
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An erasure occurs whenever the number of computations 
required to decode a block exceeds the number of computations 
that can be performed by the decoder during the time allotted 
for decoding the block. In real-time decoding, this number 
is approximately equal to the computational speed of the 
decoder times the time required to transmit one block. 
Buffering external to the decoder will permit additional 
time to be devoted to difficult blocks, beyond that required 
to transmit the block, but this effect is not major unless 
very large buffers (and delays), or off-line processing, 
is provided. 

Fig. 3. 1.1 is an extrapolation of the results of Fig. 1 
cf Pef. 15, the normalized computational distribution for a 
rate 1/3, 7-track bootstrap decoder. These curves may be 
used to approximate system performance as follows. A de- 
coder capable of performing D computations per second can 
perform L^, = D x 3000/R computations during the time re- 
quired to transmit a block of 3000 bits at an information 
bit rate of R bits per second. The normalized total number 
of computations is obtained by dividing L T by the number of 
information bits, yielding 

U * L t /3000 = D/R. 

Thus, the normalized total number of computations per block 
is just the computational speed factor, y, defined as the 
ratio of the computational rate of the decoder to the infor- 
mation bit rate. For a decoder capable of 15 megacomputations 
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Prob (Comp 





per second (MCPS)^p = 150 at a rate of 100 Kbps and \i - 1500 
at a data rate of 10 Kbps. The curves of Fig. 3. 1.1 then indi- 
cate an erasure rate of 7 x 10 ^ at an E^/N^ of approximately 

-4 

2.2 dB at 100 Kbps and an erasure rate of 3 x 10 at an 
Ej 3 /N q of approximately 1.7 dB at 10 Kbps. 

Curves of erasure vs. E,/N at data rates of 10 and 

b o 

100 Kbps assuming a 15 MCPS decoder are presented in Fig. 3.1.2. 
It should be noted that these curves are based on rather un- 
certain extrapolation and thus are subject to considerable 
inaccuracy. The performance of a 16 error correcting con- 
catenated Reed- Solomon Viterbi decoder is also shown in 
Fig. 3. 1.2 as a curve of block error probability vs. E^/N^. 

The hybrid decoder operating at 10 Kbps appears to have a 

slight performance advantage down to block error or erasure 

-7 

probabilities of 10 . At lower speed factors, hybrid de- 

coding appears to suffer badly. In particular, at 100 Kbps, 

hybrid decoding is quite inferior for block erasure proba- 

-4 

bilities less than 10 

The reason for this inferior performance is not clear. 
Fig. 2 of Ref. 15 shows an unexplained decrease in Pareto 
slope, a, for the rate 1/3 code as E^/J^ is increased from 
2 to 3 dB. It is this decrease that shows up as inferior 
hybrid decoding performance at 100 Kbps above 2 dB. Whether 
this is a basic problem, a quirk in the implementation, or 
overly ambitious extrapolation remains to be explained. 
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Figure 3.1.2 Erasure Probability for 15 MCPS Hybrid Sequential 
Decoder, Rate 1/3, Octal Channel 
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It is clear that there are significant advantages to 
high speed factor. The design for a soft decision, rate 1/3 
hybrid decoder is discussed in Section 3.2. A faster compu- 
tation rate does not presently appear to be practical, 

3.2 Hybrid Bootstrap Sequential Decoder Implementation 

A design of a hybrid sequential decoder, using the 
algorithm presented in Ref. 14, is described in this section. 
A block diagram of the decoder is shown in Fig. 3.2.1. When 
the decoder completes the decoding of a block, the received 
symbols for the next block are loaded into the decoder memory 
while the decoded data from the previous block is being read 
out. Simultaneous with received symbols being loaded, the 
state track is generated and loaded into the decoder memory. 

Three state bits are generated from the received 
symbols for each of the 512 words in the block. The state 
bits are computed as follows: the first state bit of word n 
is equal to the even parity of the sign bits of received 
symbol one for each of the seven tracks in word n; the second 
state bit is equal to the even parity of the sign bits for 
received symbol two, etc. Three more state bits in word n 
are the binary representation of KLEFT, the number of tracks 
that have not yet decoded past word n. When the memory is 
first loaded, KLEFT is set equal to seven in all 512 words. 
The final bit of the state, referred to as the alternate 
branch state bit, is particular to the track presently 
being decoded and is set equal to one on a forward move 
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Figure 3.2.1 HYBRID BOOTSTRAP SEQUENTIAL DECODER 



along the best branch from a node, and to 0 on a forward 
move on the alternate branch. 

3.2.1 Decoder Memory Organization 

A total memory size of 512 words is required. The 
total memory is divided into three sections; one for re- 
ceived symbol storage, one for information bit storage, and 
one for decoder state storage. The received symbol and the 
information bit sections are divided into seven independent 
tracks. Each track has its own address counter. The re- 
ceived symbol and information bit storage for a given track 
share the same address counter. The state memory is ad- 
dressed by an address counter which is loaded from the 
track address counter of the track currently being decoded. 

The received symbol storage requires a nine-bit word 
for each track for a total of 512 x 7 x 9 * 32,256 bits of 
storage. The information bit storage requires only one bit 
per track for a total of 512 x 7 = 3,584 bits. The state 
memory has only a single track. A seven-bit word is re- 
quired for a total of 3,584 bits. Thus, the total storage 
required is 39,424 bits. The iycle time must be approxi- 
mately 50-60 ns. It appears that these requirements can 
best be met with the Faircnxld 95410, 256-bit ECL memory. 

A total of 154 of these devices are required to build this 
memory . 
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While the decoder is computing the present node, 
the memory reads out the received symbols and state bits 
for the next computation. Since the decoder may move either 
forward or backward, the received symools and state bits 
for both the next node and the previous node must be pro- 
vided. 

3.2.2 Decoding Logic 

Since the decoding logic speed determines the 
overall computation rate, it has been worked out in some 
detail. A block diagram of the decoding logic is shown 
in Fig. 3.2.2. The decoding logic has two modes, the look- 
forward mode and the look-back mode. The decoder is in the 
look-forward mode if the present node was arrived at by a 
forward step. Otherwise, the decoder is in the look-back 
mode. 

The node metric, or MT, register contains the cumula- 
tive metric minus the cumulative threshold for t' e present 
node. A 16-bit register for use with symbol metric values 
quantized to 12 bits is assumed, based on simulations performed 
by L. Hofman and summarized in Fig. 3.2.3. The 2 curves en- 
compass a range of choices of metric quantization and of KLEFT. 

Hofman notes that, by extrapolation, a 16-bit MT register can 

33 

be expected to overflow about once every 5 x 10 blocks, 
whereas a 14-bit register could be expected to overflow every 

4 

5 x 10 blocks when used with 12-bit symbol metrics. The 
choice of 16 bits thus appears to be reasonable. Symbol metric 
quantization is discussed in connection with Fig. 3.2.4. 
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Figure ^.2.2 Hybrid Bootstrap Sequential Decoder Branch 




Prob (Diff 




Prob (Comp >T) 



Figure 3.2.4 Sensitivity of Bootstrap Decoder Computations to Quantization 
of Metrics and of KLEFT. 




The MT register is set to zero when starting or 
resuming the decoding of a track. When in the forward 

i 

mode, the metric calculator simultaneously computes the 
metric for the two successor nodes to the present node 
by adding the three symbol metrics for each branch to MT. 

The best of the two metrics is tested for threshold vio- 
lation (negative value of MT) . If threshold is violated, 
the decoder steps back to the previous mode. Otherwise, 
the decoder steps forward, sets the alternate branch state 
bit to 1, and tests the best metric for a possible threshold 
tightening. The metric MT is decreased by A if the previous 
metric was less than A and the best new metric is greater 
than or equal to A. The resulting metric is then stored 
in the metric register and the decoder steps forward on the 
best branch. 

In the back-up mode, the metric of the present 
branch and the alternate branch is calculated simultaneously. 
If the present metric is below threshold, then the threshold 
is loosened by adding A to MT and the decoder steps forward 

on the best branch. If this is not the case, and if the 

alternate branch available state bit is 1, then the metric 
of the alternate branch is tested for threshold violation. 

If the alternate branch metric is above threshold, then the 

decoder steps forward to the alternate branch, setting the 
alternate branch available state bit to 0? otherwise, the 
decoder steps back. 
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The branch metrics are computed from the symbol 


metrics and the previous node metric. The only practical 
way of generating the symbol metrics is by storing the 
values in three identical look-up tables, one for each 
symbol. Each look-up table is composed of six 256-bit 
MECL 10,000 ROM's and is addressed by three bits. These 
devices (soon to become available) will have access times 
of about 17 nanoseconds. Each symbol look-up table provides 
two sets of symbol metrics; one for the upper (0) branch and 
one for the lower (1) branch. Each symbol metric is stored 
to 12-bit precision. Initial simulations by Hofman indicate 
that with appropriate choice of KLEFT quantization to 2 bits, 
metric table quantization to 12 bits has negligible impact 
on computational requirements. A more extensive simulation 
appears to be indicated, however, before parameter choices 
are frozen. Hofman 's results are presented in Fig. 3.2.4. 
Although obtained for a rate 1/2 code, no differences are 
anticipated for a rate 1/3 code. 

In forward mode, the two branch metrics are formed 
by summing the symbol metrics with the contents, MT, of the 
node metric register. These two results are then subtracted 
from each other to determine the larger of the two. Thresh- 
old changes are obtained by adding or subtracting A from 
both the upper and lower branch metrics while they are 
being compared. 
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In the backward mode, the present node metric is 
determined by subtracting the upper branch metric from MT. 
The alternate node metric is computed by adding the alter- 
nate branch metric to the present node metric. 

The appropriate metric is selected by a three input 
multiplexer and stored as the new value of MT in the node 
metric register. The decision which determines the best 
metric also determines the information bit. The infor- 
mation bits are shifted into an encoder which determ •.nes 
the check bits for the next computation. After the infor- 
mation bits shift through the encoder, they are stored in 
the appropriate track of the information bit memory. 

As the decoder moves forward, the state bits are 
updated. Each check bit from the encoder is exclusive-OR'd 
with the sign bit of the corresponding received symbol. 

The result is exclusive-OR’d with the corresponding state 
parity bit and stored as the new state parity bit. At the 
same time, the quantity, KLEFT, is decreased by one. When 
backing up, KLEFT is increased by one and the state parity 
bits are changed back to their original condition. The 
alternate branch state bit is set to 1 or 0, depending on 
whether the forward move is along the best or worst branch, 
respectively. 
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3.2.3 Track Control Logic 

The function of the track control logic is to 
monitor the performance of the decoder and to switch to 
another track when the decoder bogs down on the present 
track. The decoder's progress on the present track is 
monitored by a counter which is incremented when the de- 
coder threshold is loosened. The counter is reset to 
zero when the decoder tightens threshold. The number in 
this counter is continuously compared with a stopping 
threshold, DSTOP. If the threshold is violated, then 
the track control logic switches the decoder to the next 
unfinished track. 

The decoder's penetration is also monitored by an 
up/down counter which is zeroed when decoding switches 
to a new track. If, when the DSTOP threshold is violated, 
the decoder has penetrated far enough that progress has 
been made, a register, KROUND, is reset. Otherwise, 

KROUND is incremented by one. 

If KROUND becomes equal to the number of unfinished 
tracks, then progress is no longer being made by any of 
the unfinished tracks. In this case, the unfinished tracks 
are restarted at the beginning of the block and the stop- 
ping threshold, DSTOP, is loosened. 
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The stopping threshold, DSTOP, is a function of 
KLEFT and DI, the number of times the unfinished tracks 
have been initialized. The quantity, KLEFT, is stored in 
the state memory. The quantity, DI, is the contents of a 
2 bit counter, initially zero, which is incremented when- 
ever all unfinished tracks become stalled, as determined 
from KROUND equaling KLEFT. The DI counter is reset 
whenever a new track is finished. The stopping thresholds 
are stored in 32 words of a single MECL 10,000 ROM ad- 
dressed by KLEFT (3 bits) and DI (2 bits) . 

When the stopping threshold, DSTOP, is violated, 
then the track control logic stops the decoding of the 
present track and begins the decoding of a new track. 

If KROUND equals KLEFT, all unfinished tracks are stalled 
and all uncoded tracks are reinitialized. The simulations 
of Ref. 14 and 15 assume restarting at the track origins. 
However, some time and probably computations would be 
saved if reinitialization were achieved by starting at a 
point between the origin and the present node, that is, 
by backing up a fixed distance after stalling-. 
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When switching to a new track, only those nodes 
which lie ten nodes or more behind the present node are 
considered to be "definitely" decoded. Since the state 
has been updated to the present node, the decoder is backed 
up ten nodes, thusly restoring the state bits of non- 
definitely decoded nodes to their previous values. To 
provide correct information for the next restart, the 
decoder is then forced forward 24 nodes, with all decoding 
operations suspended, thus storing the encoder contents 
in the information bit memory. 

Track change is then accomplished by incrementing 
the 3 bit track pointer counter which selects the active 
track. The track address counter of the new active track 
is checked to see if this track is completely decoded. 

If so, the track pointer counter is incremented until an 
unfinished track is found. 

Decoding of this track is started by first loading 
the encoder by forcing the decoder to back up 24 nodes. 

The metric register and the progress counter are then re- 
set to zero. The present node then forms a "pseudo origin" 
for the subsequent decoding operations. This completes 
the switching process and the decoder is allowed to pro- 
ceed until the stopping threshold is violated again, or 
until the track is completely decoded. 

In the event that all unfinished tracks are stalled, 
then they must be restarted at the beginning of the block 
or at intermediate points. But first the state must be 



cleared of the effects of the unfinished decoders. This 
is accomplished by loading each unfinished track into the 
decoder and forcing it to back up a fixed number of nodes 
or to the beginning of the track. Decoding then resumes 
but with a looser stopping threshold. 

3.2.4 Parts Count Estimation 


The part count necessary to implement the decoder 
has been estimated. The estimate is based on the use of 
presently available ECL 10,000 and 9,500 series logic cir- 
cuits. The MECL 10139 ROM or equivalent has been assumed 
to be available in the near future. The parts count has 
been broken down as follows: 


I Memory and Associated Registers 
! Metric Calculation 

Metric Testing and Selection 
State Calculation & Update Logic 
Track Control Logic 
Encoder 

i Memory Address Registers 
Miscellaneous 
External Buffer 

TOTAL 


170 

50 

30 

15 

30 

25 

30 

50 

50 

450 


Table 3.1.1 Approximate I.C. Requirements for 

Hybrid Bootstrap Sequential Decoding. 


This number of circuits can be packaged on 5-6 
circuit boards approximately 8x8 inches in size. Prime 
power requirements are approximately 400 watts, assuming 
50% power supply efficiency. 
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3.2.5 Decoder Computation Rare 


The part of the decoder that determines the maximum 
computation rate is the branch metric calculation and selec- 
tion circuitry shown in Figure 3.2.2, Note that the entire 
branch metric computation is done in one computational cycle. 
The total delay through this circuitry is approximately 
60 nanoseconds including set-up and propagation delays of 
the flip-flop registers involved. This is the basis for the 
15 Megacomputations per second decoding speed forecast. 

It is possible to speed up the process by the use of 

i 

pipeline techniques; i.e., by doing part of the metric calcu- 
lation on the previous computational cycle. The difficulty 
here is that whatever portion of the hardware operates on 
the previous cycle must compute branch metrics for three 
times as many nodes. This is because the present compu- 
tational cycle may step back or step forward to two different 
nodes and symbol metrics have to be provided for all three 
possibilities. 

The use of MECL III in the symbol metric stammers was 
considered briefly and rejected in favor of the ECL 10181 
arithmetic logic unit. It was found that only a small im- 
provement could be made in propagation delay at greatly in- 
creased chip count and cost. Actually, the increased size 
of the resulting circuit board layout would probably cancel 
the smaller propagation delay because of increased wire 
length . 
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3 . 3 Other Bootstrap Decoding Techniques 


The design of Section 3.2 is based on the best 
understood of the bootstrap sequential decoding techniques. 
The basic hybrid bootstrap decoding algorithm is well 
suited for hardware implementations, but initial simulation 
results do not indicate any clear performance improvement 
over concatenated convolutional - RS decoding which is 
somewhat simpler to implement. 

3.3.1 Multiple Processors 

Hybrid bootstrap decoding performance could be im- 
proved if the speed factor of the sequential decoder could 
be effectively increased by factors greater than 2 without 
significant cost increments. One approach with potential 

promise is utilization of multiple processors.. Initial 

15 

simulations, discussed by Hofman and Odenwalder , 
demonstrated a reduction in performance. The problem 
appears to reside in the communication problem among the 
processors and, in particular, in techniques for revising 
state information and recognizing definitely decoded sec- 
tions without introducing errors. Each sequential decoder 
must be able to accept changes in branch metric assignments 
without complete initialization, without looking, and without 
significant computational increases. Further work is indi- 
cated. t 


t Section 3.3.2 was authored by Dr. F. Jelinek, a consultant 
to LINKABIT on this study. He considers application of 
bootstrap techniques to Viterbi (trellis) decoding with 
long constraint length codes. 
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3.3.2 Bootstrap Trellis Decoding 

3. 3. 2.1 Description of Rudimentary Decoder 

Let K be the constraint length of a convolutional 
code, and let the constraint length of the corresponding 
truncated trellis decoder be y<K (i.e., the truncated de- 
coder has 2^”^ states per level) . We will assume that K 
is so large that the probability of error for maximum like- 
lihood decoding >,t the signal-to-noise ratio (SNR) used is 
negligible compared to the probability of error resulting 
from the scheme described below. 

The rudimentary Bootstrap Trellis decoding algorithm 
is as follows: 

1. m-1 streams of binary data are encoded using 

jut. 

the K constraint length code, and an m stream 
is created using mod 2 position-by-position addi- 
tion of the M-1 streams. 

2. The m streams are transmitted through the 
channel, and the receiver creates an appropriate 
state stream as in Bootstrap Sequential Decoding. 

3. A u -truncated trellis decoder is used to de- 
code the first stream, with metrics based on the 
corresponding received and state stream digits. 

To each depth of the N-branch codeword there cor- 

ITWl 

respond 2 likelihoods, the maximum of these at 
depth n being denoted by L R . 
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Let 


L = max Li 
n l<i<n 

so is a monotone increasing function of re(l,...,N). 
M 

If L - L <T tor all n, the decoder accepts the decoded 
first stream information sequence, otherwise it re- 
jects it (ir. fac*,. it will stop decoding after smallest 

depth n is searched for which L^ - L <T) . 

n n— 

s t 

4. If the 1 stream was accepted, Jt is replaced by 
the estimated transmitted stream, the state stream is 
accordingly recalculated, and the decoder proceeds to 
decode the 2 stream as in step 3, using a metric 
table appropriate to m-1 undecoded streams. 

5. If the I s stream was rejected, 2 n stream de- 
coding proceeds exactly as in (3) with no change to 
either metric or state stream. 

6. Steps 3 through 5 establishes a pattern that is 
adhered to in generals after every acceptance, the 
staue stream and metrics are recalculated and de- 
coding of the "round rcuin" next stream begins. 

7. Decoding terminates in either of two ways: 

a) SUCCESS: all m streams get finally accepted. 

b) FAILURE: when i streams, (Jl<m) , remain un- 

decoded, l successive attempts at 
stream decoding end with rejection. 
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3 . 3 . 2 . 2 Analytical Performance Estimates 


Using simple arguments, analogous to the r nes appearing 

16 

in the Bootstrap Sequential Decoding paper, it is possible 
to obtain bounds on the probability of DECODING FAILURE, 


P (F) . 


Let E^ (R) be the probability of undetected error 
exponent corresponding to maximum likelihood decoding of the 
first of k streams that utilize the received as well as state 
stream digits when the convolutional rate is R (the net rate 
taking into account the parity stream degradation is R) . 


Then 


P (F) = 


<_ max min 
2<k<m 


2^ max max 
3<k<m 


Nft 2-“ E ~ < R) 


1 k , NAi,2 -yE k (R| 
K i 


< R) ] k , NA 2 2-“ E 2 < r >; 


(3.3.1) 


where A^ is a monotonically increasing function of the number 
of undecoded streams A that depends on the rate R but varies 
negligibly with p. From (3.3.1) it follows that 


E rB (R) < lira - i log P(F) £ E 0B (R) (3.3.2) 

■ Lj£J y-><» ** 

where estimates of both exponents are readily available. 
In fact, let us assume that the combined expurgated and 
random coding bound exponents are the true exponents. 
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Then 


if the solution of R = i e£ (o) is o<l 


E k (R) = 


B if the solution of R= ^ 


(B) is B> 1 


1 otherwise 


(3.3.3) 


where, for the binary input, 2b-arv output symmetrical channel, 
and binary state stream. 


E° <o)=0- log E (j w(0,v| 0)q. , ( 0)1 ^ + w(l,v| OJq-.j (1) 

v=l \ ( l J L J 


1 

l+o l 


l+o 


+|j^w(0,v|0)q k _ 1 



1 

l+o 


+ w(l,v|0) 





Above, the channel outputs are pairs (u,v) , u e (0,1), vejl,...,bj 
and the inputs are x e (0,1). The transmission probability, w(u,v|x) 
is symmetric: w(u,v|x) = v(uQl, v|x©l) 


Furthermore, 



k-1 

1+ (l~2p) 

2 


9 


( 1 ) 

<ik-l 


k-1 

1- (l-2p) 

2 


b 

p— £ w(l,v|0) 

v— 1 
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Finally, 


Er (a) = a - a log 


1 + Z ^w (o, v/o)w(l,v/o)q, 
v=l J 




Clearly, for (3.3.1) and (3.3.2) 


E lb ( r ) “ min max jkE w (R) » E k (R)J 


2 k ^ m 


E UB (R) = min 


mm 


k E k (R) j , E 2 (R) 


3 ^ k ^ n 

To obtain parametric relations between E lb (r) and E UB (R) 
R, we may proceed as follows. Define 


E k (a > 


E° (a) 


E k <“> 


< 1 


a i. 


a > 1 


Theyi E^ b (R) = a for 


fi 


R- max )— E m (a) , min'-^ E. t , (a) 
a ro |a k -l 


— E /a 

— E ~ 
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and 



Where 



min 



k > 2, i E k (a) 



Finally, 


E 


as 


i ss a for 


R = 


• (l « 

mm — E 0 
I a 2 


(a) 



where 



min 


m, minjk: k 3, ^ 
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3 . 3 . 2 . 3 Refinements of the Decoding Algorithm 


1. It is not necessary to reject entire stream when 

threshold violated (i.e. at some i r L^-L^>T). Let k<£ be 
M 

such that and let J be an optimized integer. Then 

when threshold violation occurs at depth l, all decoded bits 

4- y* 

up to the K-j one are accepted, the corresponding state 
stream digits are recalculated, and on subsequent attempts 
the appropriate metrics are used. The next decoding of that 
stream then starts at position k-J, rather than at position 1. 

This pull-up strategy will occasionally introduce 
errors into the accepted stream sections, so an error- 
cleanup method must also be agreed on. There is further- 
more the problem that J should probably increase with y, 
but this may not be serious in the "reasonable” range y<10. 

2. If the pull-up strategy of (1) is used, then at the 
end of the first m attempts, the length of the definitely de- 
coded sections will have a monotone increasing tendency, e.g.: 


For this reason it might be useful to modify the round 
robin strategy by next decoding backwards starting with the 
last not fully accepted stream and continuing with the next- 
to-last stream, etc. After recoding the first stream in this 
manner, decoding would start again in the forward direction, etc. 
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Unlike a true maximum likelihood decoder , a truncated decoder 
is not symmetrical in both directions. Therefore, backward 
decoding might avoid some errors coded in the forward direction 
and vice versa. The code should be picked so it is strong in 
both directions. 

3. More complex algebraic codes outjht to be considered , such as 
the three group code. 

4. A decoding failure does not mean that the entire block 
must be thrown away. In fact, in the define tely decoded 
sections there will probably be no errors whatever. When 
FAILURE takes place, one should probably go one more round, 
ignoring the threshold stopping rule and decoding each stream 
to its end, simply accepting the admittedly unreliable decoder 
decisions. With a systematic or quick-lookin code, one might re- 
construct the unreliable positions simply from the uncoded re- 
ceived information bits. 

5. The final stopping rule that would minimize the probability 
of error in the pull up mode (1) would be that failure results 
when further decoding results in no enlargement of definitely 
accepted stream sections. 

Alternately, it might be desirable to fix the number of 
allowed decoding attempts on each track, perform these, 
and accept the last decisions regardless of whether additional 
progress was being made or not. This approval might be 
particularly useful if several truncated decoders working in 
parallel would be available. As one possibility, say 3 sets 
of m decoders would work continuously on 3 successive 
received blocks as follows: 
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the i— state \ 
stream inform- 
ation is updated 
according to the 
decisions of the / 
2 n decoder set. 




the (i + 1)^ state 
stream information 
is updated according 
to the decoding de- 
• cisions of the first 
decoder set . 

V 


r 

( 


first set of decoders 
works on i + 2 n<a block 
independently, using the 
state stream, but each 
assuming that m streams are 
undecoded at all depths. 


third set of decoders 
works indep. on the 
iih block, using 
side information 
developed by 2 nd 
decoder set. 

Decisions are 
released as 
final to the user. 


N 

Second set of decoders works on 
the (i + 1) £ lock independently, 
but using all side information 
provided by the 1 set of decoders 
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6. One way to avoid rejection is to increase the truncated 

constraint length y. Thus, for example, in the rudimentary 
decoding algorithm of section I, one would have a sequence 
^1^2**** of constraint lengths. 

Starting with constraint length y^ dticoding of a block would 
be attempted until either a SUCCESS or a FAILURE was declared. 

In the latter case, decoding would begin again based on cons- 
traint length y 2 and would continue until either a new stream 
was accepted, or another FAILURE resulted. In the former case, 
decoders would revert to constraint length y^; in the latter 
case constraint length would be increased to y^, etc. FAILURE 
with constraint length y ^ , would be final. 

This game could be played in a variety of ways. Another 
possibility is to have a sequence of constraint lengths 
^2-^3- ••• <V m ( m is the number of streams). Constraint 
length y^ is used until one stream is accepted or FAILURE is 
declared. In the former case, constraint length y m _ 1 will 
be used until another stream is accepted or FAILURE is de- 
clared. 

7 . It is not clear that the pull-up strategy of ( 1 ) is in 
the non-asymptiotic case more desirable than the rudimentary 
one. Indeed, it may be much simpler to shorten the stream 
length N and use the latter strategy. 
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8. The final possibility is to bootstrap on a concatenated 

code. To stay simple, suppose an R-S code is used over GF(8) 

that is single error correcting (not realistic, of course). 

5 

The algebraic rate then is y. 

Form 5 information streams and add 2 parity check streams 
using the R-S relation. Next encode each of the streams by use 
of rate 1/2 convolutional code that has eight 6-bit branches 
leaving each node. The matrix of the convolutional code will 
have the partitioned form (K = 6 in this example) : 


GO 0 0 0 

G1 0 0 0 

G2 GO 0 0 

G3 G1 0 0 

G4 G2 GO 0 
G5 G3 G1 0 
0 G4 G2 GO 
0 G5 G3 G1 
G4 G2 
G5 G3 
G4 
G 5 


Here G^ are 3x3 binary metrics that are restricted to have 
the forms 


f O 0 Ol 
0 0 0 
0 0 OJ 

11 t n= 1,2,..., 7 
1 1 


r° 

or 0 
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In this way G^'s are representations of the powers of the prim- 
itive element over GF(8). It can be shown that the restriction 
does not damage the error correcting capabilities of the code, 
at least not in the limit of large constraint lengths, K>>1. 

If a convolutional code of the indicated form is used, then 
it can easily be shown that the corresponding branch triplets 
of the 7 streams will have the R-S relationship. In fact, if we 
number the digits as indicated, 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

is 

19 

20 

21 


they will satisfy the parity check Matrix 


110 

010 

010 

100 

110 

100 

000 

111 

101 

101 

010 

111 

010 

000 

010 

Oil 

Oil 

001 

010 

001 

000 

000 

110 

010 

010 

100 

110 

100 

000 

111 

101 

101 

010 

111 

010 

000 

010 

Oil 

Oil 

001 

010 

001 
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Consequently, sets of these digits will be algebraically related, 
and thus bootstrapping will be possible. If the bootstrapping 
is to be simple, individual bits in the triplets should get in- 
dependent metrics. This is not rigorously possible, but from a 
practical point of view it may be enough to provide three separate 
parity check equations, each involving only one digit of the 
triplet. For digits 1,2,3 and parity checks are 


100 

001 

001 

101 

100 

101 

000 

010 

100 

110 

100 

000 

110 

010 

001 

001 

101 

100 

101 

000 

100 


It would seem poss. ole to keep state information for each digit 
according to its parity block set and then perform ordinary single 
parity bootstrapping. After the decoding of all streams is com- 
pleted, the R-S code can be used to correct remaining errors in 
the estimated information streams. Of course, the H-Matrix con- 
tains the possibilities for 3-group codes, and even more compli- 
cated ones, all of which might be worth investigating. 
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4 0 Conclusions and Recommendations 

Tables 1.0.1, 2.5.1, and 3.1.1 summarize the complexity 
of the concatenated and hybrid coding systems studied. It 
appears that the concatenated system is more cost effective 
for approximately the same performance as the hybrid systems. 
Its only drawback lies in the interleaving requirements which 
increase both the decoding delay and the gaps of non-data 
(mostly parity-check bits) by over an order of magnitude 
relative to the hybrid system. On the other hand, these 
probably are nob a detriment for time-division multiplexed 
users, and for systems where this is a problem, staggered 
interleaving will reduce the gap lengths to those of the 
hybrid system, possibly at a small cost in decoder complexity. 

It is therefore our conclusion that a concatenated 
coding system utilizing a rate 1/3, constraint length-8 , 
inner convolutional coder-Viterbi decoder, a 2048-bit Reed- 
Solomon outer block coder-decoder, and 16 words of inter- 
leaving, operating at 100 Kbps data rates is implementable 
in its entirety by a system employing approximately 333 TTL 
integrated circuits. The coding gain at = 10 for this 
system is over 9dB. Hybrid bootstrap sequential decoding 
would require on the order of 50% more integrated circuits 
of the ECL-MSI (such as MECL 10,000) logic family. Further- 
more, the performance of the latter would vary with data 
rate, being slightly superior (0.2dB) at 10 Kbps bit and 
somewhat inferior (1.0 dB) at 100 Kbps. 
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It should also be emphasized that the simulations of 
the hybr otstrap system are not completely conclusive, 

and hence the technical risk is much greater in this system. 
Among its principal weaknesses are its sensitivity to AGC 
inaccuracy and phase tracking errors, which has been shown 
(Ref. 8) to be considerable even for ordinary sequential 
decoding. In concatenated decoding, on the other hand, 
these channel inaccuracies produce a moderate known degra- 
dation (Ref. 8) on the inner Viterbi decoder, which are 
easily shown to reflect directly and in almost the same 
amount on the overall coding scheme. 

Otherwise the performance of the two seemingly 
radically different approaches are remarkably similar. 

The basic reaso . 'n retrospect, is that, as information 
theory establishes, highly efficient communication over a 
Gavissian charnel requires extremely long block lengths. 

The hybrid and concatenated systems considered here uti- 
lize about the same " superblock " length - 2 to 3 Kbits - 
with highly efficient convolutional and block codes - 
hence the similar performance. 

One final advantage of implementing a concatenated 
scheme is the essentially individual and self-justifying 
nature of each portion of the system. Viterbi decoders at 
these data rates exist already (albeit only for the less 
powerful K=7, R=l/2 code and not for the K=8, R-l/3) and 
could be inserted without procurement delay as the inner 
decoders . 


- 104 - 



The outer Reed-Solomon coder-decoder could be easily 

justified as a worthwhile development in its own right, since 

0 

such a powerful (255 characters over GF(2 ) with 16- error- 
correction capability) decoder has never been implemented in 
hardware. Finally, even the relatively straightforward 
interleaver could be justified by itself as a means of breaking 
up burst errors in convolutional decoding. Thus, such a de- 
velopment would produce multi-purpcse components as well as 
an integrated system which might once and for all conclude 
the quest for the ultimate coding system for space communi- 
cations . 
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Appendix A 


This appendix contains a Fortran and an assembly 
language version of a partial R-S decoder program. Nor- 
mally the inputs to this program are: 

1. J = the number of bits per R-~ symbol. 

2. E = the designed number of correctable errors. 

3. The coefficients of the J-degree primitive 
polynomial of a field element which generates 
the field. 

4. The 2E syndromes represented as integers. 

However, to avoid computing tha syndromes by hand, 
a few additional stateme 1 cs were added at the beginning 
of the program to enable the computer to determine these 
quantities from the error locations and error values. 

Of course, in a real system the syndromes would be com- 
puted from the received word. But the method used here 
is more convenient in checking and timing the various 
steps in the decoding operation. The program outputs are 
the error locations and the error values. 

The Fortran version of this program contains numerous 
comment cards describing the various steps in the program. 
The assembly language version follows the same format as 
the Fortran version. In addition to the basic IBM 1130 
assembly language instructions, a few instructions unique 
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to the LINKABIT system have been used. A brief description 
of these instructions is given in Table A.l. 

Following Table A.l is a listing of first the For- 
tran and then the assembly language version of this program. 
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Table A.l LINKABIT Supplement to the IBM 1130 Assembly Language Instruction Set. 





RE PkVnUURUin OP THE ORIGINAL is IVOR 


O'! 

Q- 

CD 

1-4 

IfJ 


JMJ » 

i« - 

vt! 

;<0 r-t 

jj-s — 

i *• _j 

-J to 

if) <L H 

*m > » 

\CKi Ht 


ctm 

cl! 

u; 


i< ~ 

4-4 \L* 


vL ! 
H < 

<i 

< 1 


lOU j 
If . C •> 
Um -j 
|w •> VL 


!iw - 

— to 

LL m 


'£ -M 

«*! 
<T 


CO 

o 


& 

! o 


fc: 

‘.iJ 

l r 

; t — t 

‘O 

XJ 


CO ' 

x , 
°l 

X ; 


O 

Uj cj 
o sl 

>- 
CO 


p 


iJ| 

03! 

< ; 


x 

r-i 

r-. 1 

CM 

ro \ 
o ro 

CM «H 
» » 

w to 
o t~* 
CM - 
► CM, 
CM O' 
CM » ; 
t-4 H 

• o'; 

CM 0 ! 
M /} 

- CM 


CM 


s Q 

«-* i w •— 
ICC' UJ v,!j < 

u 
o 


j 

i<L cl 
CO LI US 


ci 

CM 


rl H 
•• «H 
«H «► 
o vf? 

r-f •> 
» r-l 

CM U> 
^ •> 
• ^ 
r% f— 

cr •* 


o o i 
h > rH J 
* •* i\ 
O «— t o 
CVOi4 

•> *H jo 

i » 

f h* ’O 


to «— 

H CVj 


c; a 


jwi 
Uj >LC 
d\ O - • ! • 

LL - ( LL ^ Vb H 

o ;a •* •* » 

U iL! ID CM it-< 


OTj 

o 

ujfe h> 
io • 
acH ~ 

Uiivi r- 
Uj-cL t-% 
UiiUJ — 
K :JL 


WlH > 


— !vo 

\C ! H 

«“( ■ w 

^ }J> 

e* u. 
m n 

to I 


CL 

O 

cr 

UL 

l u 


Ll CL 

crio 

0 I LL 

01 * o 

21 LL 
lu Lj • 
O U 
II U 


Ll tL 
O ,Cj 


o ;o 


(\i ri H 
\ \ ^ 
•J > O 
CL CL X 

^ S IL 


O t-l 
C I 

| CL Cl 

i U * 

LvD * 
t-4 r-l * * 


I 

O 

LL 

CL 


+ 

U 

CK\ 


Cj 
01 
Cl 

UJ 

— * 

^ M 


UJ 

LL 

-J 

aj 

<L 

I- 

CD 

O 


<L 


II 

CO 

S-* 


CM 

II 


l u 


\*C r-l « II il CM CM -r 


; u i- l» Il : ll 

ii ii <l < n a, a: u tr n cv 

; r k i- m ir 4 2 - 2 l jl 

#— ^ *X <1 OU J HH 


CJ 

LL O I 

u’iii w tHjcr 
^ ^ I 'VL 
^ ;r-< O l-» iti; 

I'j U X J \<c 
ii oL ii ii ; u 
4-1 r»> U 7 CL 
CL 2l fiL ^ 

CO -< LL O O U) L^ — H 


c 

_> 

< 

o 

o 


o 

>c 


HHOp 

II II 
^ C 

r-4 VU ►*- ,h 


i M 


to 


CM 

a 


|<VI - 
u o 

^ HI 


CO || ;f-1 
II —’ll 


O 

X 

ty 


cl ^ h ^ l 


U II 
-J X * 
LD c 


— X. 
- 1 -r iL O 

r* t !h* Lu 

ii HH;r ii 

Ml I ! J ^ 

o r>u> f-i 

Q X XH U 

O ll It M 

c\i r-t cv r*> HI 

o a a ~ 

<x <£[<£ e> 


H»^ r-4 j • ^ II II a L o c 

; cv ro cl cl I ;<r 

1 i Z «I U. 1 J iL 

i H 1 »-» X L { Zi 

! ; 

1 'Y ' u 

1 1 

C I X *1 

-J — J 

_J _J »-( 

H* — 

— J H> 

H 

Ui 

U) 

u 

Huh 

< — * •* 
VO 

r-f r- 

g 

tr4 

L H M 

H H» J 

1 

1 

1 

i 

i 

i 

I 

j 

i 

-109- 

i 

f 

: 


1 



1 

1 



Rf rRonuoRiii:* 01 mi v>kh.inai cav . i~is~i\x'K 




o i 

O Oil 

CV £2 


* I -J 

*-< * ; cs 

O CV,: I t~> 

^ + UJ - 

HX 3 X 

Cl H HH 
►“« II I — II 

— >v ^ X 

U. <1 o < 


t/i s_ ) 

r~\ Ui !“■< ! 

x 2 . fe.: _» i 

i m < •-< a » j 

+ >-« a: *-i { 

rS II I C It } 

X — f-i V< i 
XHC. V- to 
J U. X VJ ^ H 

II X *«J U <’■“* 

H - UJ H 

cl — ; h- 

X U I </l 

w ^ ^ a. cj i 


a. h o 

J 2 . + X i LL 

*H -J ! 
j» «-* >- 1 oo 

|r-t a- ^ I «L j 

X *-* *-i i o f 

h- |H J »| .w | 

U U \ — V- I 

<x Hi m i 

u i * -J — G f 

^ *H — O') ci! O O rH i 

- "3 + * >-* O fl Jit VI 1 

*** w Ur ^ j 

it > m a: * o o a ; 

'i a. a h u L’ j artxu: 

ZZST^JUi^CO^:^^ j *~4 

cr ^ h-» i ii 4 h cu uj ui ! it 

^ Ii — X ^ H h* w 1 «*- 

M w ^ w U r- 

u it ii ~ <vi r*> o ii a/> 

li x x x yj u ~t x x x H x c* 


o <v joj 
cr: •>*•> 
a:cb 
uob 
l: «-i ca 


OOUiXrHW H H H (M 
ii in ^ ^ c v ^ ^ 

^ ^ h i;; q r IV H H 


V-I H h< 


^ u x x ii u 

it c\i *~* *-< CV 

^ w V Q U 


L H w ^ t-i U 


r-t I-. ^ ^ 


CC 2 -J eZ k -3 ^ L ii ^ fi , < si 


} fe* i 

:C O C i 

w u c ! 

rv s i 


U *H j** ►* 

i 

o o 
:o o 

!«-* CW 


- 110 - 


I 


I 



Rf PRODUCT IBIll T> Of- I Ml 


OK'KvfN Al f’Ai 4 RS POOR 





bj i 

u? i 

< 

ft. I 


o 

o 

.<> 


o 

o 

ro 


o 

o 

4 “ 

cH 


r-4 
t— i LJ 

<u- '<Z 

H4 Hi 

- ill 

X X 


JO 

‘O 

iro 


r- 1 

£3 


h-i 
+ ’ :Li 

xx o 

JU<c 

.1 Lw H 

n II r 

O -X -i 
<X uj ; 3 

•— * #— f iLJ 


O 

o 
o 
?o 

«► 

o jjc 

r-»!D CM 

o u s. 

OJ _J »-« 

•’Ioj —i 
o o | * 

OW «-* 
II o!;z li 
r<"i »-t 
.< !W 

o »■* itr o 

►-i <cj O 

L'i I hi VI 

a. h h a 


e 

> 


LJ 

f 

IC9 


:#■* O ; 
it 

;ro - 
c: o 

vh 'xj or 

3 rr 


!x 

;>-* 

Ui 

i«Z o 
o 
SL IO 
jet CM 
Ui • 
o 

ui O 
O O 
40 

P * 

o 
o 
r-C fO 
fe- CM 


O 

-i 

c 

3 

tc 


i 

QC j<M 

Ui te. 

x :>-? 

h- !_i 


CJ 


+ :n 

i'- 

ll 


O * •> 
X jr* 
Cill 


x >- o 

HJ 1 ui 


u.ro 
z: a: 


o 

o 

cr 


!° 

!o 

;o 

CM 


V) 

u 

H 


IX O 

CO rH 
O 
CM 

u 


II t-f 

s: 

Li;.z: .J 
t-4 «u. j I 
I- <— X 
^ f<> I— I — 
OX* X 

u w|r* h 

VJ 

Li O 
*- o 
CM 


tr 

O !© 

x r- 


cr 

«-i »-» 

+ li 


<n 

«i. 


o 
o 
u; 
ro 
Z * 
—> o 
* o 
CM L> 

ro 
z •> 
c o 

X c 
H 

to 

CO 


fOf 

JO 


‘ui 

13 


o 

o 


CM 


CM 


LU CV 
01 ! 

1L> 
X Li 
U . 

<X O 
X o 
ro 


;o lO 
O Ic: X* 
a: [O j j 
acH * 

Ui f <1 CM 
£*• (i HH M rl V • 

w i- h 

;Z Z CM i hk — 
J fl C£ O X 


!0 ^ 

Ui H- 

itt. UJ 

L 2 

| . 1 W 

I* -J 

jCV I 




Li CM 

X X 

Hi *h 


C> 


is 

icr 

CM 


fill- 


</; 

Ui o 
r- o 
o 
ro 
o 


JcC O 

u ^ 

I 

; 3 

Vi II 
Vi ti 

ix -I 

•J 

© 

fcs o 


JO 

* i 

O I 


ro | 
ro j 
+ I 
c I 

© |m 

ro i£ 

Ki jh 

Li 

X jX 

3 u 
— II 


o 

<M 

ro 

ro 

•> 

o 
in 
4- 
o ro 
d * 
X o 
Ui CM 
fO 
•» ro 


li 


a SL 


lo 

K> 

ro 

Ki 

to 

l^> 

p 

& 

ro 


u 


»-» ro 
o — ct: 
a> r*> 
cr 

hi 

^ a 

f -> . k 

C . 


hi f-i 
X II 
Hf *r-« 


*-» I ^ 
-I X r-» 


X 
tH O 
+ Ui 
X If 3 


-# lj x j-"* z: 


Lj I i iu« 


ii 

X x 


Ti L 

H r^f 
II O 5 m 
X Z X 


z 

o 


M 


K> 


iC 

,© 

iTO 

?o 


hi 

to 


!© 

CM 

ro 


C O ) 

ro cf 

Ki hi 

ro ro 


i. * 

"i 


c 

in 

ro 


■ j 



t 


''UtHIUBIlllY Of 7 ML Ok'K.INAl PA( -I Is PQOk’ 


iu 

G; 

<c; 

u-i 


X 




a 




a 




a 




UJ 








a 








c\, 




o 


o 

o 

H» 

1 

i 

in 

vL i 

1 

! 

! O l 

in | 

I m j 

J| 

i a- ; 

ro ! 

to 1 

d ! 

I IO j 

#> 

» 

_ , : 

io 

G 

G 

(jr xi * 

O CM 

Mf 

U 

r- o 

it O 

in 


^/j cr 

a ro 

K> 

a 

o in 

lx •* 

#■ 

o 

Hi K) 

oj 

Oj 


w m 

IO ! 

H H h ! 


•J o 

rH ^ in o fe. in ; 

c < 

1 K) H* 

II >-• 1 C 

x hi ro 1 


Oh- it 
CM p 
tx. Hi 

O a* ;— 

H- H G 
*X It 


K> 


l: c 


Hi fO ~ i— 

a o cr h j 

U H u t 

u g 

IVi r- Ht _J 


IO 

O 


X i 
o i 

IU H 
II !+ 


ix 

la 

it 


a: 

M UJ 

II 3 


cr 1 a jj o x 


ii 

a a 


G LL 

Gl 

O 

O 

u; 

ro 


u — II 

^ :JL >C 
L‘ .Hi — < _ 


.Jl 

I ^ i 

X CM CM r-f 
JL' U -f ^ : 

II «£ G X HI lit Hi M \ 

CM H H w Iff) w> h , 

U ~ ,11 If H U K) il 

t: u. x x x t x a 

H H ^ ^ H H H a 


a o 
K) O' 
ID ID 

ro ro 


o 

in 

U) 

ro 


o o 
o h- 
in in 
ro ro 


c 

CM 

\D 

fO 


O 

O 

h- 

to 


a 


Ui 


JX 
o a 

ri Z c| 

+ a cv f m 

Z H <M jo 

-J II : LL 

c. a ! 

*. H- h- Li 
ii uj uc 
:«t a o a 

>— Hi LOtL 

Ul 

Q- 

fc. 

I 


Li a I 


a 

Xi 

QL ' 


O 

cv 

o 

in 

•> 

o 

ro 

o 

IT) 

o 

CM 

a 

ID 


II 


CM 

a 


— Ho 

H r-< j II 


I O ♦ H H, 
iO c_ 3* w 

1 %. H< ^ fe. T-i 

oh m hh q: 

G fO !«J r. L M 

ii c m x n ^ if 

3 “ UJ rlK c Xy cv 
Q - a. H C 
HU '^ J.H /) 4 
J C‘!h w. 


at 

in 

s. 


a 

rH 


(O 


g 

a 

o 

in 


o 

CM 

G 

ID 


- 112-1 


OJ 

O 


G 
iT 

O 

in 

•* 

o 
o 

o j in |cr ^ 
q; 


£ 

B 

iu> 


H M 

^ oo 

• J ri 
rl 4 H 

ill — Hr Ul 

b ~> - 

— i H 
jo t* XH H 

G> h y |j 


to 

IO 

;0 

ID 


o 
o 
o 

co I II 

- '5” 

;±- 2T O «H 
^ G !<i O II 

h< ii lu u> I— cm o'; u 
j a w in w io Hr<HH 
x * uj ii a ^ ii 
uu a a r a x **. x 

H OJ G W o H J 

Hi 

ojx j j o 

4 a I jo 

o 


ul 


IO 



“““ mm h 

. R£ PRODUCIRIl 1 1 Y Of I Hfc OKIUNAl PAt 4 IS POOR 


m 


Uj 

U> 

LL 


H O 

p O 

U? o> 

in 


‘ 4 . 

1-4 


S/J 

p 


o 

tH 

OJ 

in 

o 

o 

cn 

V) 


X P 
j Ha — * 

VI + 0 Uh 

- X U J C/5 

J -J ill «n LU 

lit— HI- 
tD rl V) 
WUU 
t/J rl r— J u. 


■Cl. 

cr 

Ul 

«n 

it 

u: 

a. 

u 


cc 

fir 


H UC w 


.O 

Ic 

I© 

v0 

g 

|o 

CO 

► 

o 

lo 

o 

cu 


UJ ~ 

3 ;cc 
^ in 

n UJ 
n 4 ^: 

^ V 

U XL 


>- 

•J 

u 

LL 


il 

o 


to 

a: 

ex 

M 


11 


LL in 
UJ 

o jo 

U O Ui 
II O 
UJ VO VL 
H* ax i 
-J n J 


O 

ic\i 

o 


1 st 

!o n 

vl a 

o n n !c\i 
:<v * i P 

o hi 

V 0 II t|H 
^l. a 

4 % > H 


O 

IO 

o 

VD 

» 

o 

cr 

o 


fO 

o 

jvD 

o 

cr 

o 


vO vO 

i 

O !r-i 
fO SfO 

o :c 


P 

fK) 

O 

VD 

CM 

JO 

o 

p 


— vC 
in 


I — cH O 
h i// U 4 

II n o 


sO 


o 


II 

X fw 

a: 


n vL 


U. .J 


II ill 
tV fO 
U U) 
*£ evT 


co cr K> 

H u U 

M is. ;c 
cr n 'n 

4 r Lr 


w icr 

j h ko 

-r x ;o 
^ h;\D 
K> _J ! 

a i 

^ X (V W H 
H JU U + 
— II 
«J CV n 
II O P 
x xl 


a. 

n 

» 

X 

•a 


CM 
cr 
o 
vO 

» 

o 
in 
o 

U> 

m 

CM 
cr 
c. f + 


vLivO 


o 


*£ X UJ 

n :|| 
II II x 

X X 


UJ 
3 ~ 
X 

n <t 


O 

CL 

UJ 


a: 

CL 

< 

o 

#H 

(/J 


<r 

o 

|ac 

(Q. 

Ul 


II 

n vl 
,4 0 11 


X 

<1 r-l 

•HO LL 

J£ O 

_J n jfH I 

II II • III 

^ ^ IL 

vo vn uj uj 

TlpO p 

h h <£; u o .o 

_J -J n II r4 

^ w H u vi- 

c. <c in n *. j 


o 

VL 

o 

VlJ 

m 

o 

o 

n 

VO 

•* 

o 

vO 

o 

VO 


-r *-»> i »-4 

►-« ->i ii 




OL!+^ 

a: i£;r- >l 

a: ►*« ,s. *-* 

Ui w'm 
fri '-J 

II Ct II LL 
m IH ir* CJJ 

~ n 


— >H 

non 

»j L' 

hQ. J 

o n 

non 

n n n 

i -4 o n 

*— I ; t 

non 

J J H ;U L J 

O n H'j h 












fe- 



o 

o 

a o 

c o 


o 


O n 

cu a- 

O ft CV 

Ci o 


c 


O 

«-< o 

o o 


CM 

W IO 

k> ro 

st a- a* 

in u 

|V 0 1 


CM 

OJ 0 > 

o 


o 


c o 

o o 

o o o 

o 


O 

in 

in 

in in 

VO 


vO 


4 ) vO 

vii %X> 

Nfl \0 * 4 J 

\c 


VD 




O 








o 




i 

i 

i 

i 



• 

I 

-1X3- 










Rf PRODIJC Ifcllll Y Of HO O'k .ina: Pa* 


X ^ 


KM 

Cj 

-c£ 


? H _| 
O II 

►i r- 

- £1 


U Li . 
-J til 


> 


uj ; 


u/s 

U>! 

2i 


a 

rw ^ iu cr 
01 -J tr 

*H — ' Jl UJ 
II U. jW 
(\i ChUJ 
QH^K 
<i J) O J 
^ -J u u 
! St. 

o o 

O u 

V0 


w ] + 

<1 !rH 

<F 

** * m 

i 


\D 


Ztiz 


L»J 

^ ! 


o : 

c, : 

r- tr 
n 

cj *a. 
c* J 


-J o 

* o 

rH «J 

II + 
D ~ 

O — ' 
O <U 
4 0 'C 

vi; «j 
il 

-> x 

CJ «J 


* 

X 

-J 

2L ~ 

jH* < 
ui l-H 
\ » 

X < 

> LU 
j~ ~ ID 
i 

x u 

j u h. 

il it 

x o 
i-j J u 

o 

o 

t\| 
VO 


a. 

o 

L-4 

t/i 


!r-( 

i X 


il) 

►-» 

CO 

“3 


* JE 
iM 
M LJ 
~ !* 

«S_ i_> (— ' 
HUH 
uJ ■»! ieC 

► + H 

ft; j 

II J X — 

LL U U 

o o 0 r Li 

It o f-* l || 

U. f*. CO X LL 

o w _i i_» e? 

-« it ii ♦-< 
C.0 vj X. .X t/J 
JL J J " J 


rH 

*X 


o. 
to 

»~l 

n 
R 

tu 

Z> < 

<£ J 


o 

r- - * 

cr 

vO 

» 

o 

c; 

cr 

•i) 


♦ 

lx 


a 


«a 

II 


u *-**■ 
Q V 
O \ 

O Nt 
\ 
Vi \ 

I cv 
a 


i x 


to 

fe 

Li 


^ x 


♦ r* 

X III 


l 

to 
to 

V* 

set 

u •* 

Z> /) 


-J 

O II 

(*A 

£ co i 
>* X 

CO o 
a. 
x a: 

h lu 


♦ ^ l 

rg -j , 

CJ i 

♦ <1 jo 

Uj U H 

x ui a> 
h- cr I* 
.x in 

O Q jr-i 
UJ 

to 

04 U- ! • 

a in 
<< w 

•H {X «H 
X LiC 
<>u cc i 


c 

k; 

r*f 

CD 

J) 


Q h 'J H H 


sen 

e 

Ci 
ii 

* f 

in b 

tM o is 
r* CV 
<t r-i X 
€C SO 
^ *■* +\rt 


h J J J ,H J H H 


.i II 
U X 

u ^ 


II CJ 
X I j£ 

mJ |»1 


<1 

J> 


J X 
Lj ^ 


I, 

^ r 

2- XJ 
X Oi ^ 
'-3 JL 

u. x H 


o j 

C G 

o o 

© i 

i o m o 

a o 

O 1 

C «-i 

CJ o 

c 

j rl rl (\| 

K> 4 

K>| 

a- la- 

o c* 

«H 

1 rl H rt 

r-4 *H 

•i 

s u vt» 

I s - vu 

i 

to j 

to to to 

to to 

i 


- 114 - 


a: in r~ 

UL 1- 
II. UJ <1 
-J h- £. 
^ l-t Ci 
X X w‘ 
1-4 ^ tU 


o 

S* fi 

CJ> m 'X II 
o u* 4 - #-i 

O ♦ K> 

3v IP ^ O 
^ r- o 
U Ui c 
I- t- aj 
m cc 

JXUU 
C' ^ U Li 


•MIL (S*3?00) I «LlC ( I ) » IV ALU ( I ) 

£?C 0 T C** *■ F T (1P“, 1 7T7X7T3TT0'5rt T*T) 

6200 CCKTIMJC 
90 00 5 I CP 



The following pages give the assembly language 
version of the preceding program. It is organized so 
that the input and output statements are in Fortran and 
the rest of the program is in an assembly language sub- 
program. 
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